Learning with Limited Supervision by Input and Output Coding

نویسنده

  • Yi Zhang
چکیده

In many real-world applications of supervised learning, only a limited number of labeled examples are available because the cost of obtaining high-quality examples is high or the prediction task is very specific. Even with a relatively large number of labeled examples, the learning problem may still suffer from limited supervision as the dimensionality of the input space or the complexity of the prediction function increases. As a result, learning with limited supervision presents a major challenge to machine learning in practice. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of extra input and output information in learning. Information about the input space can be encoded by regularization. We first design a semi-supervised learning method for text classification that encodes a correlation structure of words inferred from seemingly irrelevant unlabeled text. We then propose a multi-task learning framework with a matrix-normal penalty, which compactly encodes the covariance structure of the joint input space of multiple tasks. To capture structure information that is more general than covariance and correlation, we study a class of regularization penalties on model compressibility. Then we design the projection penalty, which can encode the structure information highlighted by a dimension reduction while controlling the risk of information loss during the reduction. Information about the output space can be exploited by error correcting output codes. Inspired by composite likelihoods, we propose an improved pairwise coding for multi-label classification, which encodes pairwise label density (as opposed to label comparisons) and decodes using the composite likelihood. We then investigate problem-dependent codes, where the encoding is learned from data instead of being predefined. We first propose a multi-label output code using canonical correlation analysis, where predictability of the code is optimized. We then argue that both discriminability and predictability are critical for multi-label output codes, and propose a max-margin formulation that promotes both discriminative and predictable codes. We empirically study our methods in a wide spectrum of applications, including document categorization, landmine detection, face recognition, brain signal classification, handwritten digit recognition, house price forecasting, music emotion prediction, medical decision, email analysis, gene function classification, outdoor scene recognition, and so forth. In all these applications, our proposed methods for encoding input and output information lead to significantly improved prediction performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Project Selecting with Limited Resources in Data Envelopment Analysis with Input and Output Fuzzy

In Evaluating Performance, Selecting a Subset from a Set of Solutions with Limited Resources is Essential. If There Is More Than One Input and Output, the Data Rnvelopment Analysis Optimization Models Are Evaluated and Performance Measurement Based on the Weighted Output Is Divided Weighted Input. In This Research, Two Models of Optimization with Limited Resources Present from Data Envelopment ...

متن کامل

The Effect of Input-based and Output-based Focus on Form Instruction on Learning Grammar by Iranian EFL Learners

This quasi-experimental study investigated the effects of input-enhancement and production of sentences, containing the target structures, on learning grammar by Iranian Intermediate EFL learners. Sixty male students in three input, output, and control groups participated in the study. After checking the homogeneity of the participants with a proficiency test, the researchers administered a pre...

متن کامل

The Effect of Input, Input-output andOutput-input Modes of Teaching on Vocabulary Learning of Iranian EFL Learners

This study was designed to find which one of the three different presentations, i.e. input, input-output, and output-input, will be more effective in Iranian EFL learners' vocabulary acquisitions. To this end, first 54 out of 64 female students, aged from 19 to 23 years, with an average of 21, were selected out of starter-level EFL learners at the University of Tarbiat Moalem in Bandar Abbas, I...

متن کامل

The Effect of Comprehensible Input and Comprehensible Output on the Accuracy and Complexity of Iranian EFL Learners’ Oral Speech

This study aimed at investigating the relative impact of comprehensible input and comprehensible output on the development of grammatical accuracy and syntactic complexity of Iranian EFL learners’ oral production. Participants were 60 female EFL learners selected from a whole population pool of 80 based on the standard test of IELTS. To investigate the research questions, the participants were ...

متن کامل

Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge

Regis Cardin Dept Computer Science McGill University We attempt to combine neural networks with knowledge from speech science to build a speaker independent speech recognition system. This knowledge is utilized in designing the preprocessing, input coding, output coding, output supervision and architectural constraints. To handle the temporal aspect of speech we combine delays, copies of activa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012